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Abstract — We consider the decentralized binary hypothesis 
testing problem on trees of bounded degree and increasing 
depth. For a regular tree of depth t and branching factor 
fc > 2, we assume that the leaves have access to independent 
and identically distributed noisy observations of the 'state of the 
world' s. Starting with the leaves, each node makes a decision 
in a finite alphabet A^, that it sends to its parent in the tree. 
Finally, the root decides between the two possible states of the 
world based on the information it receives. 

We prove that the error probability vanishes only subexpo- 
nentially in the number of available observations, under quite 
general hypotheses. More precisely the case of binary messages, 
decay is subexponential for any decision rule. For general 
(finite) message alphabet M, decay is subexponential for 'node- 
oblivious' decision rules, that satisfy a mild irreducibility 
condition. In the latter case, we propose a family of decision 
rules with close-to-optimal asymptotic behavior. 

I. Introduction 

Let G ~ (V, E) be a (possibly infinite) network rooted at 
node 0. Assume that independent and identically distributed 
noisy observations of an hidden random variable s € {0, 1} 
are available at a subset U C V of the vertices. Explicitly, 
each i £ U has access to a private signal Xi £ X 
where {xiji^u are independent and identically distributed, 
conditional on s. The 'state of the world' s is drawn from 
a prior probability distribution tt = (7ro,7ri). The objective 
is to aggregate information about s at the root node under 
communication constraints encoded by the network structure, 
while minimizing the error probability at 0. 

We ask the following question: 

How much does the error probability at the root 
node increase due to these communication con- 
straints? 

In order to address this question, consider a sequence of 
information aggregation problems indexed by t. Information 
is revealed in a subset of the vertices Ut Q V . There are 
t rounds in which information aggregation occurs. In each 
round, a subset of the nodes in V make 'decisions' that are 
broadcasted to their neighbors. In the initial round, nodes i e 
Ut with distance d(0, i) = t (with d( • , • ) being the graph 
distance) broadcast a decision ai £ M. io their neighbors, 
with a finite alphabet. In the next round, nodes i E V 
with distance d{0, i) = t — 1 broadcast a decision ai £ M. to 
their neighbors. And so on, until the neighbors of announce 
their decisions in round t. Finally, the root makes its decision. 
The decision of any node i is a function of decisions of i's 
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neighbors in earlier rounds, and, if i £ U, on the private 
signal Xi received by i. 

Clearly, the root can possibly access only the private 
information available at nodes i £ V with d{0, i) < t 
(with d( • , • ) the graph distance). We can therefore assume, 
without loss of generality, that Ut {i G V : d{0,i) < t}. 
It is convenient to think of Ut as the information horizon at 
time t. 

Consider first the case in which communication is un- 
constrained. This can be modeled by considering the graph 
with vertices V = {0,1,2,3,...} and edges E = 
{(0, 1), (0, 2), (0, 3), . . . }. In other words, this is a star net- 
work, with the root at the center Without loss of generality, 
we take Ut ^ {I, . ■ . , \Ut\}, with \Ut\ t 00 as i ^ cxj. 

A simple procedure for information aggregation would 
work as follows. Each node i computes the log-likelihood 
ratio (LLR) £{xi) corresponding to the observed signal, and 
quantizes it to a value <Ti. The root adds up the quantized 
LLRs and decides on the basis of this sum. It follows from 
basic large deviation theory [1] that, under mild regularity 
assumptions, the error probability decreases exponentially in 
the number of observations 



^4 = cxp{-e(|[/t|)}, 



(1) 



This result is extremely robust: 

(1) It holds for any non-trivial alphabet \M\ > 2; 

(2) Using concentration-of-measure arguments [2], [3] it 
is easy to generalize it to families of weakly dependent 
observations [4]; 

(3) It can be generalized to network structures G with weak 
communications constrains. For instance, [5] proved that 
the error probability decays exponentially in the number 
of observations for trees of bounded depth. The crucial 
observation here is that such networks have large degree 
diverging with the number of vertices. In particular, for a 
tree of depth t, the maximum degree is at least n^/*. 

At the other extreme, Hellmann and Cover [6] considered 
the case of a line network. In our notations, we have V = 
{0,1,2,3,...}, E = {(0,1), (1,2), (2, 3),...}, and Ut = 
{1,2, ... ,t}. In [6] they proved that, as long as the LLRs 
are bounded (namely |^(a;i)| < C almost surely for some 
constant C), and the decision rule is independent of the node, 
the error probability remains bounded away from as t ^- 
00. 

If the decision rule is allowed to depend on the node, the 
error probability can vanish as t 00 provided \Ai\ > 3 
[7], [8]. Despite this, even if the probability of error decays 
to 0, it does so much more slowly than for highly connected 



networks. Namely, Tay, Tsitsiklis and Win [9] proved that 

P{a«^s} = cxp{-0(|C/tr)} (2) 

for some p < 1. In other words, the communication 
constraint is so severe that, after t steps, the amount of 
information effectively used by the root is equivalent to a 
vanishingly small fraction of the one within the 'information 
horizon' . 

These limit cases naturally lead to the general question: 
Given a rooted network (G, 0), a sequence of information 
horizons {Ut}t>i and a finite alphabet A4, can information 
be aggregated at the root in such a way that the error 
probability decays exponentially in |?7t|? The question is 
wide open, in particular for networks of with average degree 
bounded or increasing slowly (e.g. logarithmically) with the 
system size. 

Networks with moderate degree arise in a number of prac- 
tical situations. Within decentralized detection applications, 
moderate degree is a natural assumption for interference- 
limited wireless networks. In particular, systems in which a 
single root node communicates with a significant fraction of 
the sensors are likely to scale poorly because of interference 
at the root. Standard models for wireless ad hoc networks 
[10] are indeed based on random geometric graphs whereby 
each node is connected to a logarithmic number of neighbors. 

A different domain of applications for models of decen- 
tralized decision making is social learning [11]. In this case, 
each node corresponds to an agent, and the underlying graph 
is the social network across which information is exchanged. 
Also in this case, it is reasonable to assume that each agent 
has a number of neighbors which is bounded, or diverges 
slowly as the total number of agents grows. In many graph- 
theoretic models of social networks [12], although a small 
number of nodes can have large degree, the average degree 
is bounded or grows logarithmically with the network size. 

Given the slow progress with extreme network structures 
(line networks and highly-connected networks), the study 
of general moderate degree networks appears extremely 
challenging. In this paper we focus on regular trees. More 
precisely, we let G be the (infinite) regular tree with branch- 
ing factor k, rooted at (each node has k descendants and, 
with the exception of the root, one parent). The information 
horizon Ut is formed by all the nodes at distance t from the 
root, hence \Ut\ = k*. Under a broad set of assumptions, we 
prove that the probability of error decays subexponentially 
in the size of the information set, cf. Eq. (|2]i, where p — 
Pm < 1 depends on the size of the alphabet \M.\ = m. 

More precisely, we establish subexponential convergence 
in the following cases: 

1) For binary messages = 2 and any choice of the de- 
cision rule. In fact, we obtain a precise characterization 
of the smallest possible error probability in this case. 

2) For general message alphabet 3 < \M\ < 00 provided 
the decision rule does not depend on the node, and 
satisfies a mild 'irreducibility' condition (see Section 
II V-BI for a definition). 



In the latter case, one expects that exponential convergence 
is recovered as the message set gets large. Indeed we prove 
that the optimal exponent in Eq. (|2]i obeys 

<PA< < l-exp{-C2|A^|}. (3) 

The upper bound follows from our general proof for irre- 
ducible decision rules, while the lower bound is obtained by 
constructing an explicit decision rule that achieves it. 

Our investigation leaves several interesting open problems. 
First, it would be interesting to compute the optimal exponent 
p = p{k,Ai) for given degree of the tree and size of 
the alphabet. Even the behavior of the exponent for large 
alphabet sizes is unknown at the moment (cf. Eq. (O). 
Second, the question of characterizing the performance limits 
of general, node-dependent decision rules remains open for 
\A4\ > 3. Third, it would be interesting to understand the 
case where non-leaf nodes also get private signals, e.g., 
Ut ~ {i i G V, d{0, i) < t}. Finally, this paper focuses 
on tree of bounded degree. It would be important to explore 
generalization to other graph structures, namely trees with 
slowly diverging degrees (which could be natural models for 
the local structure of preferential attachment graphs [13]), 
and loopy graphs. Our current results can be extended to trees 
of diverging degree only in the case of binary signals. In this 
case we obtain that the probability of error is subexponential 

F{a,^s} = c^p{-o{\Ut\)} (4) 

as soon as the degree is sub-polynomial, i.e. k = o(n°) for 
all a > 0. 

The rest of the paper is organized as follows: Section 
nil defines formally the model for information aggregation. 
Section nnipresents our results for binary messages \ = 2. 
Section|IV]treats the case of decision rules that do not depend 
on the node, with general Ai. 

II. Model Definition 

As mentioned in the introduction, we assume the network 
G = {V, E) to be an (infinite) rooted /c-ary tree, i.e. a tree 
whereby each node has k descendants and one parent (with 
the exception of the root, that has no parent). Independent 
noisy observations ('private signals') of the state of the world 
s are provided to the nodes at all the nodes at t-th generation 
Ut = {i & V : d{0,i) = t}. These will be also referred to 
as the 'leaves'. Define 7i= \Ut\ = fc*. Formally, the state of 
the world s e {0,1} is drawn according to the prior tt and 
for each i E Ut an independent observation Xi G X is drawn 
with probability distribution po{-) (if s ~ 0) or pi(-) (if 
s = 1). For notational simplicity we assume that X is finite, 
and that po{x), pi{x) > for all x G X. Also, we exclude 
degenerate cases by taking ttq^tti > 0. We refer to the refer 
to the two events {s = 0} and {s = 1} as the hypotheses 
Ho and Hi. 

In round 0, each leaf i sends a message (7; e to its 
parent at level 1. In round 1, the each node j at level 1 
sends a message <Tj G Ai to its parent at level 2. Similarly 
up to round t. Finally, the root node makes a decision <T0 G 



{0, 1} based on the k messages it receives. The objective is 
to minimize Pen- = P(cr0 7^ s). We call a set of decision 
rules optimal if it minimizes Pdr- 

We will denote by di the set of children of node i. We 
denote the probability of events under Hq by Po( ), and the 
probability of events under Hi by Pi(-). Finally, we denote 
by fi the decision rule at node i in the tree. If i is not a 
leaf node and i ^ 0, then fi : Ai^ M. The root makes 
a binary decision /0 : Ai'^ {0, 1}. If i is a leaf node, 
it maps its private signal to a message, fi : A" — > 7M. In 
general, fi's can be randomized. 

III. Binary messages 

In this section, we consider the case Ai = {0, 1}, i.e., the 
case of binary messages. 

Consider the case ttq ~ tti ~ 1/2, X = {0,1} and 
Ps{x) = (1 — 6)I{x = s) + SI{x 7^ s) for s = 0, 1; where 
6 G (0, 1/2). Define the majority decision rule at non-leaf 
node i as follows: ct,; takes the value of the majority of agi 
(ties are broken uniformly at random). 

It is not hard to see that if we implement majority updates 
at all non-leaf nodes, we achieve 

PmajK 7^.S) =CXp{-l](L(fc + l)/2j*)} (5) 

Note that this is an upper bound on error probability under 
majority updates. 

Our main result shows that, in fact, this is essentially the 
best that can be achieved. 

Theorem 3.1: Fix the private signal distribution, i.e., fix 
po{-) and pi{-). There exists C < 00 such that for all fc e 
and < e IN^, for any combination of decision rules at the 
nodes, we have 

PK^s)>cxp|-C(^^) I (6) 

In particular, the error probability decays subexponentially 
in the number of private signals n = fc*, even with the 
optimal protocol. 

A. Proof of Theorem 13.71 

We prove the theorem for the case ttq = tti = 1/2, A" = 
{0,1} andpsix) = {l-S)I{x = s) + dl{x ^ s) for s 0,1; 
where (5 e (0, 1/2). The proof easily generalizes to arbitrary 
TT, A'jPo and pi. 

Also, without loss of generality we can assume that, for 
every node z, 

P(g = Ik. = 1) > P(g = ik. = o) 

P(S = 0|cr, = 1) - P(S = 0|(Tj =0) ^ ^ 

(otherwise simply exchange the symbols and modify the 
decision rules accordingly). 

Denote by TyJ the (negative) logarithm of the 'type I error' 
in fTi, i.e. 77^ = — log(P(s = 0, 0"^ = 1)). Denote by 77° the 
(negative) logarithm of the 'type II error' in cr^, i.e. 77° = 
-log(P(s = l,a, = 0)). 

The following is the key lemma in our proof of Theorem 



Lemma 3.2: Given S > 0, there exists C = C{S) > 
such that for any k we have the following: There exists an 
optimal set of decision rules such that for any node i at level 

r e IN, 

V\vl<cmk + l)/2r. (8) 
Proof: [Proof of Theorem 13. II Applying Lemma |3?2] to 
the root 0, we see that min(?7j,, ry") < C((fc + l)/2)'. The 
result follows immediately. ■ 

Lemma 13.21 is proved using the fact that there is an 
optimal set of decision rules that correspond to deterministic 
likelihood ratio tests (LRTs) at the non-leaf nodes. 

Definition 3.3: Choose a node i. Fix the decision 
functions of all descendants of i. Define Li{agi) = 
V{Hi\aa,)/nH(M- 

a) The decision function fi is a monotone deterministic 
likelihood ratio test if: 

(i) It is deterministic. 

(ii) There is a threshold 6 such that 

F{f, = l,L,<e) = 

P(/, =O,i,;>0) = O 

b) The decision function fi is a deterministic likelihood 
ratio test if either fi or f^ is a monotone deterministic 
likelihood ratio test. Here ff is the Boolean complement 

of .f^. 

The next lemma is an easy consequence of a beautiful 
result of Tsitsiklis [14]. Though we state it here only for 
binary message alphabet, it easily generalizes to arbitrary 
finite M. 

Lemma 3.4: There is a set of monotone deterministic 
likelihood ratio tests at the nodes that achieve the minimum 
possible P((T0 7^ s). 

Proof: Consider a set of decision rules that minimize 

P(a0 ^ s). 

Fix the rule at every node except node i to the optimal 
one. Now, the distributions Po(crai) and Pi (erg, ) are fixed. 
Moreover, P((T0 7^ s) is a linear function of q{fi) = 
(Po(CTi), Pi(tTi)), where Ps(cri) denotes the distribution of 
CTi under hypothesis Hg. The set Q of achievable q's is 
clearly convex, since randomized fi is allowed. From [14, 
Proposition 3. 1], we also know that Q is compact. Thus, there 
exists an extreme point of Q that minimizes P(cr0 7^ s). Now 
[14, Proposition 3.2] tells us that any extreme point of Q can 
be achieved by a deterministic LRT. Thus, we can change 
fi to a deterministic LRT without increasing P((T0 7^ s). If 
fi is not monotone (we know that i 7^ in this case), then 
we do ^ ff and /j(o-j, craj\j) ^ fj{a^,a9j\i). Clearly, 
P((T0 7^ s) is unaffected by this transformation, and fi is 
now a monotone rule. 

We do this at each of the nodes sequentially, starting at 
level 0, then covering level 1 and so on until the root 0. Thus, 
we change (if required) each decision rule to a monotone 
deterministic LRT without increasing P(CT(5 7^ s). The result 
follows. ■ 

Clearly, if fi is a monotone LRT, Eq. (|7]i holds. In fact, 
we argue that there is a set of deterministic monotone LRTs 



with strict inequality in Eq. (|7J, i.e., such that 

ns = l\'J^ - 1) PGs^lk. ^0) 

P(s = 0|a, = 1) P(.s = 0\<7, =0) ^ ^ 

holds for all i, that are optimal. 

Eq. (|2l) can only be written when P(tTi = 0) > and 
P((Ti = 1) > 0. Consider a leaf node i. Without loss of 
generality we can take <Ti = Xj for each leaf node i (since any 
other rule can be 'simulated' by the concerned level 1 node). 
So we have P(cr, = 0) > and P(ct, = 1) > 0, Eq. ^ holds 
and fi is a deterministic LRT. We can ensure these properties 
inductively at all levels of the tree by moving from the leaves 
towards the root. Consider any node i. If P((Ti = 0) = 0, then 
i ^ (else Pcrr = 1/2) and the parent of i is ignoring the 
constant message received from i. We can do at least as well 
by using any non-trivial monotone deterministic LRT at i. 
Similarly, we can eliminate P(ct, = 1) = 0. If P(ct^ = 0) > 
and P((Ti = 1) > 0, then Eq. (|9]l must hold for any monotone 
deterministic LRT fi, using the inductive hypothesis. 

Definition 3.5: Let a and (3 be binary vectors of the same 
length T. We say S ^ /3 if > /?,; for all i G {1, 2, . . . , r}. 
We now prove Lemma |372l 

Proof: [Proof of Lemma 13.21 

From Lemma \3A\ and Eq. (|9]l, we can restrict attention to 
monotone deterministic LRTs satisfying Eq. (|9]l. 

We proceed via induction on level t. For any leaf node 
i, we know that r]\ = 77" = — log(J/2). Choosing C = 
— log((5/2), Eq. ^ clearly holds for all nodes at level 0. 
Suppose Eq. (O holds for all nodes at level r. Let i be a 
node at level t + 1. Let its children be di = {ci, C2, . . . , Cfc}. 
Without loss of generality, assume 

ric,>Vc.>---> ic, (10) 
Claim: We can also assume 

ril,<ril,<-.-<rfl (H) 

Proof of Claim: Suppose, instead, 77"^ > 77"^ (so ci is 
doing better than 0-2 on both types of error). We can use 
the protocol on the subtree of ci also on the subtree of 
C2. Call the message of C2 under this modified protocol 
dc2- Since, yyj,^ > TyJ,^ and 77"^ > 7/"^ (both types of eiTor 
have only become less frequent), there exists a randomized 
function F : {0,1} {0,1}, such that fsiFidc^) = 
1) = f's{<Jc2 = 1) for s = 1,2. Thus, node i can use 
fi{ac-^, F{ac2),o-c3, ■ ■ ■ ,o'ck) to achieve the original values 
of if^^ and 77"^, where fi is decision rule being used at i 
before. Clearly, the error probabilities at i, and hence at 
the root, stay unchanged with this. Thus, we can safely 
assume 77"^ < 77"^. Similarly, we can assume 77". < 77".^^ for 
I = 2, 3, . . . , fc — 1. Clearly, our transformations retained the 
property that nodes at levels r+l and below use deterministic 
LRTs satisfying Eq. Similar to our argument for Eq. (|9) 
above, we can make appropriate changes in the decision rules 
at levels above r + 1 so that they also use deterministic LRTs 
satisfying Eq. (|9]l, without increasing error probability. This 
proves the claim. 



Recall that fi : {0,1}'' ^ {0,1} is the decision rule at 
node i. Assume the first bit in the input corresponds to ctci, 
the second corresponds to (Tc2, and so on. Using Lemma [34l 
we can assume that fi implements a deterministic likelihood 
ratio test. Define the fc-bit binary vectors u/* = (111...1), 
a;i = (Oil . . . 1), . . . , w'^ = (00 . . . 0). From Lemma 111 
and Eq. (|9]l, it follows that fi{u^) = I(j < jo) for some 
Jo e {0,l,...,fc,fc + l}. 

Claim: Without loss of generality, we can assume that jo 7^ 
and jo k + 1. 

Proof of Claim: Suppose jo = 0. It follows from Lemma 
13.41 and Eq. (|9]l that fi{(TQi) = for every possible <TQi. If 
i = then we have Porr > 1/2. Suppose i 7^ 0. Then ai 
is a constant and is ignored by the parent of i. We cannot 
do worse by using an arbitrary non-trivial decision rule at i 
instead. (The parent can always continue to ignore ct,.) The 
case jo = fc + 1 can be similarly eliminated. This proves the 
claim. 

Thus, we can assume 70 G without loss of 

generality. Now a; >z w^"^^ contribute to type I error and 
w :< Lo^" contribute to type II error. It follows that 

k 

r7^< E^c, < (fc-io + l)<.„, (12) 

3= jo 
30 

where we have used the ordering on the error exponents 
(Eqs. ( [Tol l and (fTTTi). Eqs. (fT2T i and ( fT3] l lead immediately to 

V\lrfc,,,+rllhl%,<{k + l)- (14) 

Now, for any x, y > 0, we have x + y > 2y/xy. Plugging 
X = Vi/Vcjf ^iid y ~ Vi/Vcjg^ we obtain from Eq. (fT4l l 

By our induction hypothesis rf^. rj" < C^{{k + l)/2)^'^. 
Thus, 7]\r]'} < C'^iik + l)/2)2(^+i) as required. Induction 
completes the proof. ■ 

IV. 'Node-oblivious' rules with non-binary 

MESSAGES 

In this section we allow a general finite message alphabet 
Ai that need not be binary. However, we restrict attention 
to the case of node-oblivious rules: The decision rules fi at 
all nodes in the tree, except the leafs and the root, must be 
the same. We denote this 'internal node' decision rule by 
f : ^ M. Also, the decision rules used at each of the 
leaf nodes should be same. We denote the leaf decision rule 
by g : A" — > TW. The decision rule at the root is denoted 
hy h — f0 : — )• {0,1}. We call such {f,g,h) a node- 
oblivious decision rule vector 

Define m= In Section HV-AI we present a scheme 
that achieves 

P(t7o^s) = cxp{-r!({fc(l-l/7n)}*)} , (16) 



when the error probabihty in the private signals is sufficiently 
small. Next, under appropriate assumptions, we show that 
the decay of error probability must be sub-exponential in 
the number of private signals fc*. 

A. An efficient scheme 

For convenience, we label the messages as 



M = 



— m + 1 —771 + 3 



771—1 

2 



(17) 



The labels have been chosen so as to be suggestive (in a 
quantitative sense, see below) of the inferred log-likelihood 
ratio. Further, we allow the messages to be treated as real 
numbers (corresponding to their respective labels) that can 
be operated on. In particular, the quantity Si = X^ceSi '-'<^ 
well defined for a non-leaf node i. 

The node-oblivious decision rule we employ at a non-leaf 
node i 7^ is 



Si/fc+(m-l)/2 
1-1/rn 

Si/fc-(m-l)/2 
1-1/m 



m — 1 
2 



if 5, < 
if Si > 



(18) 



Note that the rule is symmetric with respect to a inversion 
of sign, except that Si = is mapped to the message 1/2 
when m is even. 

The rule g(a;i) used at the leafs is simply 5(1) = (777— 1)/2 
and g{0) = —(777 — l)/2. The decision rule at the root is 



1 , if 5^ > 
, otherwise. 



(19) 



If we associate Hq with negative quantities, and Hi with pos- 
itive quantities, then again, the rule at the leafs is symmetric, 
and the rule at the root is essentially symmetric (except for 
the case S^ = 0). 

Lemma 4.1: Consider the node-oblivious decision rule 
vector (/, g, h) defined above. For k > 2 and m > 3, there 
exists 6q = S{m, k) > such that the following is true for 
all 6 < Sq: 

(i) Under Hq, for node i at level t > 0, we have 

-logP[cr, = -(777 - l)/2 + l] > {l/m){k{l - 1/777) y 

(20) 

for Z = 1,2, . . .,777 - 1. 

(ii) Under Hi, for node i at level r > 0, we have 

-logP[(T, = (777- l)/2 ~l]> (Vm){fc(l - 1/m) y 

(21) 

for Z = 1,2,..., 777- 1. 

Proof: We prove (i) here. The proof of (ii) is analogous. 
Assume H^. Define 7 = fc(l — I/ttt.) and C = 
k\ogra/{k — 1). We show that, in fact, for suitable choice 
of 5q the following holds: If 5 < then for any node i at 
any level r > 0, 



- l0gP[CT^ = -(777 - l)/2 + > 

(l/m)Y + C 



We proceed by induction on r. Consider i at level t = 0. 
WehavePo[cr« = -{m-l)l2+l\ = for ? = 1, 2, . . . ,777-2 
and Po [cr.j = (777 - l)/2] = 5. Choosing = cxp(-l - C), 
we can ensure that Eq. (|22] | holds at level 0. Note that for 
fc ^ 1, we have 5^ ~ l/(e777). 

Now suppose Eq. (|22] | holds at level t. Consider node i 
at level r + 1. From Eq. (fTsT l, for cr; = —(777 — l)/2 + Z we 
need 



Si > k[-{m - l)/2 + ;(1 - 1/m)] 



(23) 



For every agi = {-{m - l)/2 + li,-{m - l)/2 + 
I2, ■ ■ ■ , —{m — 1) /2 + Ik) such that Eq. ( l23T l holds, we have 
ELi h > - 1/777). Thus, 



Po(aoO < exp \^~kC ~ {l/m)Y ^ h 

< exp {-kC - (l/m)Z7^+i) (24) 
Obviously, there are at most tti'^ such agi. Thus, 

Po[(T, =-(777 -l)/2 + /] 
< 777*= exp (-fcC - (l/?77)/7^ + l) 

exp (-C- (1/777)^7^+^) 

Thus, Eq.(l22]i holds at level t + 1. Induction completes the 
proof. ■ 
Theorem 4.2: For k > 2 and 777 > 3, there exists So = 
(5o(777, k) > 0, and a node-oblivious decision rule vector, such 
that the following is true: For any S < Sq, we have 

777 — 1 



'[<J^^s] <cxp|- 



exp 



2777 
777 —1 
2777 



{k{l - 1/777) } 



(25) 



(22) 



with p = 1 + log(l — 1 /m) I log k. 

Proof: The theorem follows from Lemma 14.11 and the 
root decision rule Eq. ([19). 

Assume H^. For every ag^ = (^(™ ^ l)/2 + ^i, —(777 — 
l)/2 + ?2, ■ • • , -(777- l)/2 + ?fc) such that S^ > 0, we have 
Ej=i h > k{l-l/m){m- I) /{2m). From Lemma gUi), 

Po(aa.|-ffo) < exp ^-fcC- (1/777)7*-! 

< exp (-fcC - (777 - 1)77(2777)) , (26) 

where 7 = fc(l — I/777) and C = k\ogm/(k — 1). Obvi- 
ously, there are at most 777 such ag^. It follows that 

Po(o-a = l|ffo) < exp (-fcC - (?77 - 1)77(2777)) 

= exp (-C - (777 - 1)77(2777)) . 
Similarly, we can show 

Pi(CTj, = 0|iJl) < exp (-C - (777 - 1)77(2777)) 

Combining, we arrive at 

^{(^9 7^ s) < exp (-C - (777 - 1)77(2777)) 
Recall that C > 0. Thus, we have proved the result. ■ 



B. Siibexponential decay of error probability 

Define n = fc*, i.e., n is the number of private signals 
received, one at each leaf. The scheme presented in the 
previous section allows us to achieve error probability that 
decays like cxp(-fi({fc (1 - 1/m)}*)) = cxp(-f](n'')), 
where p = 1 + log(l — l/m)/logfc w 1 — l/(mlogfc) 
for m ':$> \. In this section we show that under appropriate 
assumptions, error probability that decays exponentially in 
n, i.e., exp(— 0(n)), is not achievable with node-oblivious 
rules. 

In this section we call the letters of the message alphabet 
Jv[ = {1,2,..., m}. For simplicity, we consider only deter- 
ministic node-oblivious rules, though our results and proofs 
extend easily to randomized rules. 

We define here a directed graph Q with vertex set M 
and edge set 8 that we define below. We emphasize that Q 
is distinct from the tree on which information aggregation 
is occurring. There is a directed edge from node pi e M 
to node pj € M. m Q \f there exists a ^ such that 
jjLj appears at least once in a and /(a) = /i^. Informally, 
(fiijUj) G £ if Pi can be 'caused' by a message vector 
received from children that includes pj. We call Q the 
dependence graph. 

We make the following irreducibility assumptions on the 
node-oblivious decision rule vectors {f,g,h) under consid- 
eration (along with leaf and root decision rules). 

Assumption 1: The dependence graph Q is strongly con- 
nected. In other words, for any pi £ Ai and jij E Ai such 
that pj ^i, there is a directed path from pi to Hj in Q. 

Assumption 2: There exists a level Tp > such that for 
node i at level Tp, we have Po(cri = m) > for all p e M. 

Note that Po(cri = ^) > implies Vi{(7i = p) > by 
absolute continuity of Po(a;,;) w.r.t. Pi(.t,;). 

Assumption 3: There exists /i_ e Ai, fJ.+ E Ai, rj > 
and r* such that, for all r > Td the following holds: For 
node i at level r, we have Po((Ti — M-) > V ^"d Pi((7i = 
> -q. 

In other words, we assume there is one 'dominant' message 
under each of the two possible hypothesis. 

It is not hard to verify that for fc > 2, m > 3 and 6 < 
6o{m,k) (where Sq is same as in Lemma \4~T\ and Theorem 
14.2b . the scheme presented in the previous section satisfies 
all four of our assumptions. In other words, the assumptions 
are all satisfied in the regime where our scheme has provably 
good performance. 

Definition 4.3: Consider a directed graph Q = (V,£) that 
is strongly connected. For u,v G V, let dm, be the length 
of the shortest path from u to v. Then the diameter of Q is 
defined as 



where p = 1 



log(l^fc-'') 
d log k 



< 1. 



diameter(5) 



max max 



Theorem 4.4: Fix m and k. Consider any node-oblivious 
decision rule vector (/, g, h) such that Assumptions [T]|2] and 
|3]are satisfied. Let d be the diameter of the dependence graph 
Q. Then, there exists C = C{f,m,k) < oo such that we have 



Now Q has m vertices, so clearly d < m— 1. The following 
corollary is immediate. 

Corollary 4.5: Fix m and k. Consider any node-obUvious 
decision rule vector (/, g, h) such that Assumptions [T] [2] and 
[3] are satisfied. Then, there exists C = C(/, to, fc) < oo such 
that we have 



(28) 



where p = 1 



log(l~fc-'"-i)) 



< L 



(m — 1) log k 

Thus, we prove that under the above irreducibility assump- 
tions, the error must decay subexponentially in the number 
of private signals available at the leaves. 

Remark 4.6: We have '¥q{<7q^ = (^_, . . . , > 
rf^ . It follows that we must have fa{p~, /i^, . . . , = 
(else the probability of error is bounded below by rf /2 for 
any t). Similarly, we must have f0{p+,p+, . . . , p+) = 1. In 
particular, /i_ 7^ 

Lemma 4. 7: If Assumption |2] holds, then for a node i at 
any level r > Tp, we have Po(cri = p) > for all p E A4. 

Proof: It follows from Assumption |2] that for any p € 
A4, there is some a G Ai'^ such that f{a^) = p. We prove 
the lemma by induction on the level t. Let 

Sr = For node i at level t, Fo((Ji = /i) > for all p E A4. 

By assumption, Sr holds. Suppose Sr holds. Consider node 
i at level t + 1. Consider any p G Al. By inductive 
hypothesis, we have Po(cai — cl^) > 0. It follows that 
Po(ct< = p) > 0. Thus, Sr+i holds. ■ 

Lemma 14.81 can be thought of as a quantitative version of 
Lemma l4n showing that the probabiUty of the least frequent 
message decays subexponentially. 

Lemma 4.8: Suppose Assumptions[Tl|2]and[3]are satisfied. 
Fix s e {0,1}. Consider a node i at level t. Define 
Ct = min^e7nP(cri = mI-^s)- Let t* = max(Tp,Td) 
(cf. Assumptions 12] O. Let d = diameter(tj). There exists 
C = C'{f, TO, fc) < 00 such that for any a G N U {0} and 
& e {0, 1, . . . , d - 1}, we have. 



Cr.+ad+& >exp{-C'(fc'*-l)''} 



(29) 



Proof: Assume Hq holds, i.e. s ~ 0. The proof for 
s = 1 is analogous. 

We prove that, in fact, the following stronger bound holds: 

- \0g{Cr,+ad+b) < C'ik" - lY - log(l/,7)/(fc'* - 2) . 

(30) 

We proceed via induction on a. First consider a = 0. 
Consider a node i at level t* + 5 for h G {0, 1, . . . , d — 
1}. Consider the descendants of node i at level t*. For any 
p G A^, we know from Lemma Wn\ that there must be some 
assignment of messages to the descendants, such that ai = p. 
It follows that 



»[a, ^s] >cxp{-Cn''} , 



(27) 



Cr,+b > Cr, 



(31) 



Thus, choosing C = fc^'-H- logCr.) + log(l/?/)/(fc'^ - 2), 
we can ensure that Eq. (l30l l holds for a = and all b e 

{o,i,-..,rf-i}. 

Now suppose Eq. dSOl l holds for some a e N U {0}. 
Consider a node i at level r» + (a + + 6. Let 2? be the 
set of descendants of node i at level r* + ad + b. Note that 
= fc''. Consider any G Ai. By Assumption [1] there is 
a directed path in Q of length at most d going from /^t to 
By Remark l4~6l we know that (/^_, fi^) E £. It follows that 
there is a directed path in Q of length exactly d going from 
/i to ^_ . Thus, there must be an assignment of messages (T-p 
to nodes in V, including at least one occurrence of such 
that ai = ij. Using Assumption [5] we deduce that 

t+ad+b 

Rewriting as 

~ l0gCT. + (a+l)d+6 < 

{k^ - 1)(- logCr.+ad+b) + l0g(l/77) , 

and combining with Eq. ( |30] l, we obtain 

-l0g(Cr.+(a+l)d+b) < 

c'{k' - ir+' - iog(i/7?)/(fc'^ - 2) . 

Induction completes the proof. ■ 
Theorem 14.41 follows. 

Proof: [Proof of Theorem 14.41 Assume Hq. From 
Lemma I4r8l 

Po(^a0 = M+, ■ • ■ , > cxp j-Cfc^"'^} 

> exp{-Cn'^} 

for C EE C'F(^*+'*-i). It follows that 

Po(o-0 - 1) > exp {-CnP} . (32) 

Similarly, 

Pi (0-0 = 0) > cxp {-Cn'^} . (33) 

The result follows. ■ 
Remark 4.9: For the scheme presented in Section IIV-AI 
we have d w logj, m, where d ~ diameter(C7). For any e > 0, 
Theorem 14.41 provides a lower bound on error probability 
with p < 1 - Ci/to1+' for some C\ = Ci{k, e) > 0. This 
closely matches the m dependence of the upper bound on 
error probability we proved in Theorem 14. 2J 

C. Discussion of the irreducibility assumptions 

We already mentioned that the efficient node-oblivious 
rule presented in Section IIV-AI satisfies all of Assumptions 
[U |2] and [3] Moreover, it is natural to expect that similar 
schemes based on propagation of quantized likelihood ratio 
estimates should also satisfy our assumptions. In this section, 
we further discuss our assumptions taking the cases of binary 
and ternary messages as examples. 



1) Binary messages: Binary messages are not the focus 
of Section lTV-BI However, we present here a short discussion 
of Assumptions 1, 2 and 3 in the context of binary messages 
for illustrative purposes. 

Claim: If m = 2, each of the irreducibility assumptions 
must be satisfied by any node-oblivious rule for which error 
probability decays to with t. 

Proof of Claim: Call the messages M. = {0, 1}. Consider a 
node-oblivious decision rule vector {f,g,h) such that error 
probability decays to with t. Then g cannot be a constant 
function (e.g., identically 0), since this leads to Pen- > 1/2. 

Suppose Assumption [T] is violated. Without loss of gen- 
erality, suppose (0, 1) ^ £. Then f{a) = 1 for all a ^ 
(0, 0, ... , 0). It follows that for node i at level r, we have 

p,(a, = o)<cxp(-e(r))*^o, (34) 

for both s = and s = 1. In particular. Pen is bounded 
away from 0. This is a contradiction. 

Suppose Assumption |2] is violated. Then, wlog, all nodes 
at level 1 transmit the message 1 almost surely, under either 
hypothesis. Thus, all useful information is lost and Porr > 
1/2. This is a contradiction. 

Finally, we show that Assumption [3] must hold as well. 
Define £,r = ^oio'i = 0) for node i at level t. Wlog, 
suppose > 1/2 occurs infinitely often. Then we have 
/i(0,0,...,0) 0, else Pe„. > 2'''-'^ for infinitely many 
t. Define = ^i{<^i = 0) for node i at level t. If ^t- > 
1/2 occurs infinitely often, then it follows that Pi(<Tg0 = 
(0,0,..., 0)) > 2-'= and hence Piia^ = 0) > 2"'' occur 
for infinitely many t. So we can have ^t- > 1/2 only 
finitely many times. Also, h{l, 1, . . . , 1) = 1 must hold. It 
follows that < 1/2 occurs only finitely many times. Thus, 
Assumption |3] holds with /y = 1/2. 

2) Ternary messages: By Theorem 14.21 the scheme pre- 
sented in Section IIV-AI achieves Pen- = exp {— ri({2fc/3}*} 
in the case of ternary messages. 

We first show that if Assumption|2]is violated, then Pen = 
exp{-0({(fc + l)/2}*)}. If Assumption |2] does not hold, 
then only at most two letters are used at each level. It follows 
that we can have a (possibly node-dependent) scheme with 
binary messages that is equivalent to the original scheme at 
levels 1 and higher. Our lower bound on Pen then follows 
from Theorem 13. II Thus, even in the best case, performance 
is significantly worse than the scheme presented in Section 
IIV-AI So a good scheme for ternary messages must satisfy 
Assumption |2] 

Now consider Assumption [T] Let A4 = {—1,0, 1}. Sup- 
pose Assumption [U is violated. Then wlog, there is no path 
from letter to one of the other letters. It follows that under 
either hypothesis, we have fs{o'i = 0) = exp {— r2(fc'^)} for 
node i at level r. Thus, the letter occurs with exponentially 
small probability, irrespective of s. This should essentially 
reduce, then, to the case of binary messages, and we expect 
performance to be constrained as above. 

Finally, consider Assumption [3] We cannot have 
h{fi, ^, fi) = for all /i G Ai, since that will lead to 
Pi(c^0 7^ s) > 1/9 for all t. Similarly, we can also exclude 



the possibility h{^, fi, fi) = 1 for all /i e A^. Wlog, suppose 
= and /i(l,l,l) = 1. Now consider the 
problem of designing a good aggregation protocol. By the 
above, we must have Pi(<t.; = —1) and Po(cri = 1), for 
node i at level r, to each converge to with increasing 
r. Further, it appears natural to use the message fi = 
with an interpretation of 'not sure' in such a situation. We 
would then like the probability of this intermediate symbol 
to decay with r, or at least be bounded in the limit, i.e., 
limsup^_j,Q^ Ps(cri = 0) < 1 for each possible s. If this 
holds, we immediately have Assumption [3] (with /i_ = — 1 
and ^+ — 1). 

3) Need for assumptions: We argued above that our 
irreducibility assumptions are quite reasonable in various cir- 
cumstances. In fact, we expect the assumptions to be a proof 
artifact, and conjecture that a subexponential convergence 
bound holds for general node-oblivious rules. A possible 
approach to eliminate our assumptions would be to prune the 
message alphabet M, discarding letters that never appear, or 
appear with probability bounded by exp(— ri(fc*)) (because 
they require descendants from a strict subset of Ai). 
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